# ViT Architecture

Vitmodel Skincheck
MIT
This is a Vision Transformer-based model for classifying facial skin types into 5 categories.
Image Classification Transformers English
V
belpin
61
1
Coco Instance Eomt Large 1280
MIT
This paper proposes a method to reinterpret Vision Transformer (ViT) as an image segmentation model, demonstrating ViT's potential in image segmentation tasks.
Image Segmentation
C
tue-mps
105
0
Ade20k Panoptic Eomt Giant 1280
MIT
This paper proposes a method to reinterpret Vision Transformer (ViT) as an image segmentation model, revealing ViT's potential in image segmentation tasks.
Image Segmentation
A
tue-mps
96
0
Ade20k Panoptic Eomt Large 1280
MIT
This paper proposes an image segmentation model based on Vision Transformer (ViT), revealing the potential of ViT in image segmentation tasks.
Image Segmentation PyTorch
A
tue-mps
129
0
Coco Panoptic Eomt Large 1280
MIT
This paper proposes a novel perspective by treating Vision Transformer (ViT) as an image segmentation model and explores its potential in image segmentation tasks.
Image Segmentation
C
tue-mps
119
0
Coco Panoptic Eomt Large 640
MIT
This model reveals the potential of Vision Transformer (ViT) in image segmentation tasks by adapting its architecture for segmentation purposes.
Image Segmentation
C
tue-mps
217
0
Coco Instance Eomt Large 640
MIT
This paper proposes a method to reinterpret Vision Transformer (ViT) as an image segmentation model, demonstrating ViT's potential in image segmentation tasks.
Image Segmentation
C
tue-mps
99
0
Coco Panoptic Eomt Giant 1280
MIT
By rethinking the architecture of Vision Transformer (ViT), this model demonstrates its potential in image segmentation tasks.
Image Segmentation PyTorch
C
tue-mps
90
0
Vit Chest Xray
MIT
A fine-tuned model based on Vision Transformer (ViT) architecture for classifying chest X-rays, trained on the CheXpert dataset.
Image Classification Transformers English
V
codewithdark
316
1
C RADIOv2 B
Other
C-RADIOv2 is a visual feature extraction model developed by NVIDIA, offering multiple size versions suitable for image understanding and dense visual tasks.
Transformers
C
nvidia
404
8
Fairface Age Image Detection
Apache-2.0
An image classification model based on Vision Transformer architecture, pre-trained on the ImageNet-21k dataset, suitable for multi-category image classification tasks
Image Classification Transformers
F
dima806
76.6M
10
Plant Identification Vit
Apache-2.0
A plant identification model fine-tuned based on Google Vision Transformer (ViT) architecture, achieving 80.96% accuracy on the evaluation set
Image Classification Transformers
P
marwaALzaabi
37
1
Vit Base Patch32 Clip 224.laion2b E16
MIT
Vision Transformer model trained on the LAION-2B dataset, supporting zero-shot image classification tasks
Image Classification
V
timm
7,683
0
Dust3r ViTLarge BaseDecoder 512 Dpt
DUSt3R is a model for easily achieving geometric 3D vision from images, capable of reconstructing 3D scenes from single or multiple images.
3D Vision
D
naver
46.93k
14
Dust3r ViTLarge BaseDecoder 512 Linear
DUSt3R is a deep learning model for generating 3D geometric models from images, capable of easily handling geometric 3D vision tasks.
3D Vision Safetensors
D
naver
313
0
Dust3r ViTLarge BaseDecoder 224 Linear
DUSt3R is a model for easily achieving geometric 3D vision from images, capable of reconstructing 3D scenes from single or multiple images.
3D Vision Safetensors
D
naver
1,829
0
Cvlface Adaface Vit Base Kprpe Webface12m
MIT
Face recognition model based on keypoint relative position encoding, using ViT architecture and trained on the WebFace12M dataset
Face-related Transformers English
C
minchul
122
1
Finetuned Clothes
Apache-2.0
A clothing classification model fine-tuned based on Google's ViT model, supporting image classification for 7 clothing categories
Image Classification Transformers
F
samokosik
50
2
Skin Cancer Image Classification
Apache-2.0
Vision Transformer (ViT)-based skin cancer image classification model capable of identifying 7 types of skin lesions
Image Classification Transformers
S
Anwarkh1
3,309
22
Vogue Fashion Collection 15
Apache-2.0
A fashion collection classification model fine-tuned based on Google Vision Transformer (ViT), capable of recognizing clothing collections from 15 top fashion brands.
Image Classification Transformers
V
tonyassi
38
6
Deepfake Vs Real Image Detection
Apache-2.0
An image classification model based on Vision Transformer architecture, used to detect real images versus AI-generated fake images.
Image Classification Transformers
D
dima806
129.66k
27
Organoids Prova Organoid
Apache-2.0
This model is a fine-tuned image classification model based on Google's ViT-base-patch16-224 on an image folder dataset, achieving an accuracy of 85.76% on the evaluation set.
Image Classification Transformers
O
gcicceri
25
1
Driver Drowsiness Detection
Apache-2.0
Driver fatigue detection model based on ViT architecture, fine-tuned on the UTA RLDD dataset with an accuracy of 97.5%
Image Classification Transformers
D
chbh7051
131
2
Clip Vit Large Patch14 Finetuned Fruits 360 Vitlarge
High-precision fruit image classification model fine-tuned on the Fruits-360 dataset based on CLIP ViT-Large
Image Classification Transformers
C
AnneMarie1
29
0
Helicopters Vit
A helicopter image classification model based on the Vision Transformer architecture, capable of identifying different types of helicopters
Image Classification Transformers
H
Flightworks
18
0
Vit Model
A ViT model fine-tuned on the preprocessed 1024 configuration dataset for image classification tasks
Image Classification Transformers
V
mm-ai
19
0
Hq Fer2013notestaugm
Apache-2.0
A fine-tuned image classification model based on ViT architecture, excelling on the FER2013 dataset
Image Classification Transformers
H
Piro17
17
0
Large Algae Vit Rgb
Apache-2.0
This model is a vision model based on the Vision Transformer (ViT) architecture, focusing on the classification task of algae images.
Image Classification Transformers
L
samitizerxu
39
0
Gender
Apache-2.0
A model fine-tuned based on facebook/deit-small-patch16-224, with no specific use case clearly stated
Image Classification Transformers
G
ivensamdh
35
1
Google Vit Base Patch16 224 Cartoon Emotion Detection
Apache-2.0
A fine-tuned cartoon image emotion classification model based on Google Vision Transformer (ViT) architecture, achieving 88% accuracy on the test set
Image Classification Transformers
G
jayanta
25
4
Vit Base Beans
Apache-2.0
An image classification model fine-tuned on the beans dataset based on Google's ViT model, achieving an accuracy of 96.99%
Image Classification Transformers
V
simlaharma
22
0
Vit Large Patch32 224.orig In21k
Apache-2.0
An image classification model based on Vision Transformer (ViT) architecture, pretrained on the ImageNet-21k dataset, suitable for feature extraction and fine-tuning scenarios.
Image Classification Transformers
V
timm
771
0
Vit Base Patch16 224 In21k Finetuned Cifar10 Album Vitvmmrdb Make Model Album Pred
Apache-2.0
A Vision Transformer (ViT) based model fine-tuned on the CIFAR-10 dataset for image classification tasks
Image Classification Transformers
V
venetis
30
0
Ast Finetuned Audioset 12 12 0.447
Bsd-3-clause
An Audio Spectrogram Transformer (AST) fine-tuned on the AudioSet dataset, using ViT architecture to process audio spectrograms, achieving excellent performance on multiple audio classification benchmarks.
Audio Classification Transformers
A
MIT
25
0
Vit Base DogSick
Apache-2.0
A visual classification model fine-tuned based on Google's ViT base model, suitable for domain-specific image recognition tasks
Image Classification Transformers
V
jungjongho
29
0
Vit Base Patch16 384 Wi3
Apache-2.0
Fine-tuned model based on Google Vision Transformer (ViT) architecture, suitable for image classification tasks
Image Classification Transformers
V
Imene
21
0
Yolos Small Rego Plates Detection
Apache-2.0
A small vision Transformer model based on the YOLOS architecture, fine-tuned specifically for license plate detection tasks
Object Detection Transformers
Y
nickmuchi
400
5
Yolos Small Finetuned Masks
Apache-2.0
A small-scale Vision Transformer model based on YOLOS architecture, fine-tuned specifically for mask detection tasks, trained on COCO and mask detection datasets
Object Detection Transformers
Y
nickmuchi
153
1
Vit Test 1 95
This is an image classification model based on the Vision Transformer architecture, achieving an accuracy of 95.02%.
Image Classification Transformers
V
25khattab
15
0
Test
Apache-2.0
This model is an image classification model fine-tuned on an image folder dataset based on facebook/deit-tiny-patch16-224
Image Classification Transformers
T
flyswot
19
0
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase